Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MINOR: add wait_for_assigned_partitions to console-consumer #8192

Merged
merged 4 commits into from
Mar 1, 2020

Conversation

brianbushree
Copy link
Contributor

what/why

the throttling_test was broken by this PR (#7785) since it depends on the consumer having partitions-assigned before starting the producer

this PR provides the ability to wait for partitions to be assigned in the console consumer before considering it started.

caveat

this does not support starting up the JmxTool inside the console-consumer for custom metrics while using this wait_until_partitions_assigned flag since the code assumes one JmxTool running per node.

I think a proper fix for this would be to make JmxTool its own standalone single-node service

alternatives

we could use the EndToEnd test suite which uses the verifiable producer/consumer under the hood but I found that there were more changes necessary to get this working unfortunately (specifically doesn't seem like this test suite plays nicely with the ProducerPerformanceService)

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@brianbushree brianbushree changed the base branch from 2.4 to trunk February 28, 2020 21:24
@@ -273,8 +276,25 @@ def _worker(self, idx, node):
with self.lock:
self.read_jmx_output(idx, node)

def _wait_until_partitions_assigned(self, node, timeout_sec=60):
if self.jmx_object_names is not None:
raise Exception("'wait_until_partitions_assigned' is not supported while using 'jmx_object_names'/'jmx_attributes'")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not ideal but requires more refactoring to be able support multiple JmxTools running at once

@brianbushree
Copy link
Contributor Author

just confirmed that this doesn't break fetch_from_follower_test

@brianbushree brianbushree changed the title add wait_for_assigned_partitions to console-consumer [MINOR] add wait_for_assigned_partitions to console-consumer Feb 28, 2020
@brianbushree brianbushree changed the title [MINOR] add wait_for_assigned_partitions to console-consumer MINOR: add wait_for_assigned_partitions to console-consumer Feb 28, 2020
Copy link
Member

@bbejeck bbejeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM, but since I'm not too familiar, I'd like to get a second pair of eyes on this.

one of @hachikuji, @mattwong949, @gardnervickers

tests/kafkatest/tests/core/throttling_test.py Outdated Show resolved Hide resolved
@brianbushree
Copy link
Contributor Author

hit one timeout with --repeat 3 going to try increasing the timeout and running it again

================================================================================
SESSION REPORT (ALL TESTS)
ducktape version: 0.7.6
session_id:       2020-02-28--011
run time:         41 minutes 34.257 seconds
tests run:        6
passed:           5
failed:           1
ignored:          0
================================================================================
test_id:    kafkatest.tests.core.throttling_test.ThrottlingTest.test_throttled_reassignment.bounce_brokers=False
status:     PASS
run time:   6 minutes 50.193 seconds
--------------------------------------------------------------------------------
test_id:    kafkatest.tests.core.throttling_test.ThrottlingTest.test_throttled_reassignment.bounce_brokers=True
status:     FAIL
run time:   4 minutes 54.465 seconds


    consumer was not assigned partitions within 60 seconds
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/ducktape/tests/runner_client.py", line 132, in run
    data = self.run_test()
  File "/usr/local/lib/python2.7/dist-packages/ducktape/tests/runner_client.py", line 189, in run_test
    return self.test_context.function(self.test)
  File "/usr/local/lib/python2.7/dist-packages/ducktape/mark/_mark.py", line 428, in wrapper
    return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File "/opt/kafka-dev/tests/kafkatest/tests/core/throttling_test.py", line 174, in test_throttled_reassignment
    lambda: self.reassign_partitions(bounce_brokers, self.throttle))
  File "/opt/kafka-dev/tests/kafkatest/tests/produce_consume_validate.py", line 97, in run_produce_consume_validate
    self.start_producer_and_consumer()
  File "/opt/kafka-dev/tests/kafkatest/tests/produce_consume_validate.py", line 53, in start_producer_and_consumer
    self.consumer.start()
  File "/usr/local/lib/python2.7/dist-packages/ducktape/services/service.py", line 234, in start
    self.start_node(node)
  File "/opt/kafka-dev/tests/kafkatest/services/console_consumer.py", line 297, in start_node
    self._wait_until_partitions_assigned(node)
  File "/opt/kafka-dev/tests/kafkatest/services/console_consumer.py", line 292, in _wait_until_partitions_assigned
    err_msg="consumer was not assigned partitions within %d seconds" % timeout_sec)
  File "/usr/local/lib/python2.7/dist-packages/ducktape/utils/util.py", line 41, in wait_until
    raise TimeoutError(err_msg() if callable(err_msg) else err_msg)
TimeoutError: consumer was not assigned partitions within 60 seconds

--------------------------------------------------------------------------------
test_id:    kafkatest.tests.core.throttling_test.ThrottlingTest.test_throttled_reassignment.bounce_brokers=False
status:     PASS
run time:   6 minutes 59.611 seconds
--------------------------------------------------------------------------------
test_id:    kafkatest.tests.core.throttling_test.ThrottlingTest.test_throttled_reassignment.bounce_brokers=True
status:     PASS
run time:   7 minutes 49.718 seconds
--------------------------------------------------------------------------------
test_id:    kafkatest.tests.core.throttling_test.ThrottlingTest.test_throttled_reassignment.bounce_brokers=False
status:     PASS
run time:   6 minutes 42.782 seconds
--------------------------------------------------------------------------------
test_id:    kafkatest.tests.core.throttling_test.ThrottlingTest.test_throttled_reassignment.bounce_brokers=True
status:     PASS
run time:   8 minutes 17.261 seconds
--------------------------------------------------------------------------------

@bbejeck
Copy link
Member

bbejeck commented Feb 28, 2020

Ran on branch builder (the 2.4 branch PR) with 5 repeats, all tests passed

http://confluent-kafka-branch-builder-system-test-results.s3-us-west-2.amazonaws.com/2020-02-28--001.1582930027--brianbushree--throttle-test-fix--3a3d718/report.html

re-running the trunk PR version.

Copy link
Contributor

@mattwong949 mattwong949 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, ran locally 3 times and it passed all of them

jmx_tool.start_jmx_tool(self.idx(node), node)
jmx_tool.read_jmx_output(self.idx(node), node)
assigned_partitions_jmx_attr = "kafka.consumer:type=consumer-coordinator-metrics,client-id=%s:assigned-partitions" % self.client_id
wait_until(lambda: assigned_partitions_jmx_attr in jmx_tool.maximum_jmx_value,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should be re-running read_jmx_output before checking this condition

@brianbushree
Copy link
Contributor Author

they're passing for me with --repeat 3

================================================================================
SESSION REPORT (ALL TESTS)
ducktape version: 0.7.6
session_id:       2020-02-28--014
run time:         41 minutes 20.200 seconds
tests run:        6
passed:           6
failed:           0
ignored:          0
================================================================================
test_id:    kafkatest.tests.core.throttling_test.ThrottlingTest.test_throttled_reassignment.bounce_brokers=False
status:     PASS
run time:   7 minutes 2.933 seconds
--------------------------------------------------------------------------------
test_id:    kafkatest.tests.core.throttling_test.ThrottlingTest.test_throttled_reassignment.bounce_brokers=True
status:     PASS
run time:   6 minutes 46.708 seconds
--------------------------------------------------------------------------------
test_id:    kafkatest.tests.core.throttling_test.ThrottlingTest.test_throttled_reassignment.bounce_brokers=False
status:     PASS
run time:   6 minutes 56.677 seconds
--------------------------------------------------------------------------------
test_id:    kafkatest.tests.core.throttling_test.ThrottlingTest.test_throttled_reassignment.bounce_brokers=True
status:     PASS
run time:   6 minutes 54.640 seconds
--------------------------------------------------------------------------------
test_id:    kafkatest.tests.core.throttling_test.ThrottlingTest.test_throttled_reassignment.bounce_brokers=False
status:     PASS
run time:   6 minutes 56.818 seconds
--------------------------------------------------------------------------------
test_id:    kafkatest.tests.core.throttling_test.ThrottlingTest.test_throttled_reassignment.bounce_brokers=True
status:     PASS
run time:   6 minutes 42.204 seconds
--------------------------------------------------------------------------------

@bbejeck
Copy link
Member

bbejeck commented Feb 29, 2020

Ok to test

@bbejeck
Copy link
Member

bbejeck commented Feb 29, 2020

previous branch builder test run failed.

Re-running the test now. If the test passes, I'll merge this and cherry-pick back.

@bbejeck
Copy link
Member

bbejeck commented Mar 1, 2020

2nd branch builder run passed.

@bbejeck bbejeck merged commit 72a5aa8 into apache:trunk Mar 1, 2020
@bbejeck
Copy link
Member

bbejeck commented Mar 1, 2020

merged #8192 into trunk

Thanks @brianbushree!

bbejeck pushed a commit that referenced this pull request Mar 1, 2020
what/why
the throttling_test was broken by this PR (#7785) since it depends on the consumer having partitions-assigned before starting the producer

this PR provides the ability to wait for partitions to be assigned in the console consumer before considering it started.

caveat
this does not support starting up the JmxTool inside the console-consumer for custom metrics while using this wait_until_partitions_assigned flag since the code assumes one JmxTool running per node.

I think a proper fix for this would be to make JmxTool its own standalone single-node service

alternatives
we could use the EndToEnd test suite which uses the verifiable producer/consumer under the hood but I found that there were more changes necessary to get this working unfortunately (specifically doesn't seem like this test suite plays nicely with the ProducerPerformanceService)

Reviewers: Mathew Wong <[email protected]>, Bill Bejeck <bbejeck.com>
@bbejeck
Copy link
Member

bbejeck commented Mar 1, 2020

cherry-picked to 2.4

qq619618919 pushed a commit to qq619618919/kafka that referenced this pull request May 12, 2020
)

what/why
the throttling_test was broken by this PR (apache#7785) since it depends on the consumer having partitions-assigned before starting the producer

this PR provides the ability to wait for partitions to be assigned in the console consumer before considering it started.

caveat
this does not support starting up the JmxTool inside the console-consumer for custom metrics while using this wait_until_partitions_assigned flag since the code assumes one JmxTool running per node.

I think a proper fix for this would be to make JmxTool its own standalone single-node service

alternatives
we could use the EndToEnd test suite which uses the verifiable producer/consumer under the hood but I found that there were more changes necessary to get this working unfortunately (specifically doesn't seem like this test suite plays nicely with the ProducerPerformanceService)

Reviewers: Mathew Wong <[email protected]>, Bill Bejeck <bbejeck.com>
@brianbushree brianbushree deleted the throttle-test-fix branch August 27, 2021 23:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants